# INT8 Quantization
Bytedance BAGEL 7B MoT INT8
Apache-2.0
BAGEL is an open-source 7B active parameter multimodal foundation model supporting multimodal understanding and generation tasks
Text-to-Image
B
Gapeleon
190
20
Meta Llama 3.1 8B Instruct Quantized.w8a8
This is the INT8 quantized version of the Meta-Llama-3.1-8B-Instruct model, optimized through weight and activation quantization, suitable for multilingual business and research applications.
Large Language Model
Transformers Supports Multiple Languages

M
RedHatAI
9,087
16
Qwq 32B INT8 W8A8
Apache-2.0
INT8 quantized version of QWQ-32B, optimized by reducing the bit-width of weights and activations
Large Language Model
Transformers English

Q
ospatch
590
4
Qwen2.5 VL 7B Instruct Quantized.w8a8
Apache-2.0
Quantized version of Qwen2.5-VL-7B-Instruct, supporting vision-text input and text output, optimized for inference efficiency through INT8 weight quantization
Image-to-Text
Transformers English

Q
RedHatAI
1,992
3
Deepseek R1 Distill Qwen 14B Quantized.w8a8
MIT
The quantized version of DeepSeek-R1-Distill-Qwen-14B, optimized with INT8 quantization for weights and activations, reducing GPU memory requirements and improving computational efficiency.
Large Language Model
Transformers

D
neuralmagic
765
2
FLUX.1 Dev Qint8
Other
FLUX.1-dev is a text-to-image diffusion model quantized to INT8 format using Optimum Quanto, suitable for non-commercial use.
Text-to-Image English
F
Disty0
2,617
12
Bge Large En V1.5 Quant
MIT
Quantized (INT8) ONNX variant of BGE-large-en-v1.5 with inference acceleration via DeepSparse
Text Embedding
Transformers English

B
RedHatAI
1,094
22
Roberta Base Go Emotions Onnx
MIT
This is the ONNX version of the RoBERTa-base-go_emotions model, supporting full precision and INT8 quantization for multi-label emotion analysis tasks.
Text Classification
Transformers English

R
SamLowe
41.50k
20
Distilbert Base Uncased Distilled Squad Int8 Static Inc
Apache-2.0
This is the INT8 quantized version of the DistilBERT base uncased model, specifically designed for question answering tasks, optimized for model size and inference speed through post-training static quantization.
Question Answering System
Transformers

D
Intel
1,737
4
Bert Large Uncased Whole Word Masking Squad Int8 0001
BERT-large English Q&A model pre-trained with whole word masking and fine-tuned on SQuAD v1.1, quantized to INT8 precision
Question Answering System
Transformers

B
dkurt
23
0
Featured Recommended AI Models